Système d'information stratégique et agriculture (serveur d'exploration)

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Domain‐independent automatic keyphrase indexing with small training sets

Identifieur interne : 000A14 ( Main/Exploration ); précédent : 000A13; suivant : 000A15

Domain‐independent automatic keyphrase indexing with small training sets

Auteurs : Olena Medelyan [Nouvelle-Zélande] ; Ian H. Witten [Nouvelle-Zélande]

Source :

RBID : ISTEX:B7ADEE369E44E8C2FD9532F4554A0A0F18548E84

Descripteurs français

English descriptors

Abstract

Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist with navigation. Manual assignment is expensive because trained human indexers must reach an understanding of the document and select appropriate descriptors according to defined cataloging rules. We propose a new method that enhances automatic keyphrase extraction by using semantic information about terms and phrases gleaned from a domain‐specific thesaurus. The key advantage of the new approach is that it performs well with very little training data. We evaluate it on a large set of manually indexed documents in the domain of agriculture, compare its consistency with a group of six professional indexers, and explore its performance on smaller collections of documents in other domains and of French and Spanish documents.

Url:
DOI: 10.1002/asi.20790


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Domain‐independent automatic keyphrase indexing with small training sets</title>
<author>
<name sortKey="Medelyan, Olena" sort="Medelyan, Olena" uniqKey="Medelyan O" first="Olena" last="Medelyan">Olena Medelyan</name>
</author>
<author>
<name sortKey="Witten, Ian H" sort="Witten, Ian H" uniqKey="Witten I" first="Ian H." last="Witten">Ian H. Witten</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:B7ADEE369E44E8C2FD9532F4554A0A0F18548E84</idno>
<date when="2008" year="2008">2008</date>
<idno type="doi">10.1002/asi.20790</idno>
<idno type="url">https://api.istex.fr/document/B7ADEE369E44E8C2FD9532F4554A0A0F18548E84/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000669</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">000669</idno>
<idno type="wicri:Area/Istex/Curation">000635</idno>
<idno type="wicri:Area/Istex/Checkpoint">000459</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000459</idno>
<idno type="wicri:doubleKey">1532-2882:2008:Medelyan O:domain:independent:automatic</idno>
<idno type="wicri:Area/Main/Merge">000A14</idno>
<idno type="wicri:source">INIST</idno>
<idno type="RBID">Pascal:09-0056665</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000030</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000033</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000068</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000028</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000028</idno>
<idno type="wicri:doubleKey">1532-2882:2008:Medelyan O:domain:independent:automatic</idno>
<idno type="wicri:Area/Main/Merge">000A47</idno>
<idno type="wicri:Area/Main/Curation">000A14</idno>
<idno type="wicri:Area/Main/Exploration">000A14</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Domain‐independent automatic keyphrase indexing with small training sets</title>
<author>
<name sortKey="Medelyan, Olena" sort="Medelyan, Olena" uniqKey="Medelyan O" first="Olena" last="Medelyan">Olena Medelyan</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Nouvelle-Zélande</country>
<wicri:regionArea>Department of Computer Science, University of Waikato, Private Bag 3105, Hamilton 3240</wicri:regionArea>
<wicri:noRegion>Hamilton 3240</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Nouvelle-Zélande</country>
</affiliation>
</author>
<author>
<name sortKey="Witten, Ian H" sort="Witten, Ian H" uniqKey="Witten I" first="Ian H." last="Witten">Ian H. Witten</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Nouvelle-Zélande</country>
<wicri:regionArea>Department of Computer Science, University of Waikato, Private Bag 3105, Hamilton 3240</wicri:regionArea>
<wicri:noRegion>Hamilton 3240</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Nouvelle-Zélande</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Journal of the American Society for Information Science and Technology</title>
<title level="j" type="abbrev">J. Am. Soc. Inf. Sci.</title>
<idno type="ISSN">1532-2882</idno>
<idno type="eISSN">1532-2890</idno>
<imprint>
<publisher>Wiley Subscription Services, Inc., A Wiley Company</publisher>
<pubPlace>Hoboken</pubPlace>
<date type="published" when="2008-05">2008-05</date>
<biblScope unit="volume">59</biblScope>
<biblScope unit="issue">7</biblScope>
<biblScope unit="page" from="1026">1026</biblScope>
<biblScope unit="page" to="1040">1040</biblScope>
</imprint>
<idno type="ISSN">1532-2882</idno>
</series>
<idno type="istex">B7ADEE369E44E8C2FD9532F4554A0A0F18548E84</idno>
<idno type="DOI">10.1002/asi.20790</idno>
<idno type="ArticleID">ASI20790</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">1532-2882</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Automatic indexing</term>
<term>Controlled vocabulary</term>
<term>Indexing</term>
<term>Information extraction</term>
<term>Information system</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Extraction information</term>
<term>Indexation</term>
<term>Indexation automatique</term>
<term>Système information</term>
<term>Vocabulaire contrôlé</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist with navigation. Manual assignment is expensive because trained human indexers must reach an understanding of the document and select appropriate descriptors according to defined cataloging rules. We propose a new method that enhances automatic keyphrase extraction by using semantic information about terms and phrases gleaned from a domain‐specific thesaurus. The key advantage of the new approach is that it performs well with very little training data. We evaluate it on a large set of manually indexed documents in the domain of agriculture, compare its consistency with a group of six professional indexers, and explore its performance on smaller collections of documents in other domains and of French and Spanish documents.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Nouvelle-Zélande</li>
</country>
</list>
<tree>
<country name="Nouvelle-Zélande">
<noRegion>
<name sortKey="Medelyan, Olena" sort="Medelyan, Olena" uniqKey="Medelyan O" first="Olena" last="Medelyan">Olena Medelyan</name>
</noRegion>
<name sortKey="Medelyan, Olena" sort="Medelyan, Olena" uniqKey="Medelyan O" first="Olena" last="Medelyan">Olena Medelyan</name>
<name sortKey="Witten, Ian H" sort="Witten, Ian H" uniqKey="Witten I" first="Ian H." last="Witten">Ian H. Witten</name>
<name sortKey="Witten, Ian H" sort="Witten, Ian H" uniqKey="Witten I" first="Ian H." last="Witten">Ian H. Witten</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Agronomie/explor/SisAgriV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A14 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000A14 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Agronomie
   |area=    SisAgriV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:B7ADEE369E44E8C2FD9532F4554A0A0F18548E84
   |texte=   Domain‐independent automatic keyphrase indexing with small training sets
}}

Wicri

This area was generated with Dilib version V0.6.28.
Data generation: Wed Mar 29 00:06:34 2017. Site generation: Tue Mar 12 12:44:16 2024